Skip to content

Conversation

@chewy-zlai
Copy link
Collaborator

@chewy-zlai chewy-zlai commented Dec 23, 2024

Summary

Creates console command "zipline" with subcommands "compile" and "run". This change refactors run.py from argparse to click so both can be commands for a click group. This change also updates the pypi package name to zipline-ai.

Checklist

  • Added Unit Tests
  • [ x] Covered by existing CI
  • [ x] Integration tested
  • Documentation update

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new command-line interface (CLI) using the click library.
    • Added zipline command group with extract_and_convert and run_main commands.
  • Improvements

    • Enhanced command-line argument parsing and handling.
    • Updated package metadata and versioning.
    • Expanded supported Spark versions.
  • Package Changes

    • Renamed package from "chronon-ai" to "zipline-ai".
    • Updated package description and version.
  • Breaking Changes

    • Modified command-line argument structure.
    • Replaced argparse with click library for argument management.
  • Dependency Updates

    • Added multiple new dependencies, including google-cloud-storage.
    • Removed several dependencies related to Google Cloud services from development requirements.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 23, 2024

Walkthrough

The pull request introduces a comprehensive refactoring of command-line argument handling across multiple Python files, transitioning from argparse to the click library. The changes standardize the CLI structure, update package metadata, and create a new zipline command group that consolidates various repository-related commands. The modifications enhance script flexibility, improve parameter management, and prepare the package for more streamlined command-line interactions.

Changes

File Change Summary
api/py/ai/chronon/repo/run.py Replaced argparse with click, updated function signatures, added main() function, expanded Spark version support
api/py/setup.py Updated package name to zipline-ai, changed version, replaced scripts with entry_points
api/py/ai/chronon/repo/compile.py Added explicit command name 'compile' to @click.command()
api/py/ai/chronon/repo/zipline.py Created new CLI command group with extract_and_convert and run_main commands
api/py/test/test_run.py Refactored tests to use click context instead of argparse, removed parser fixture
api/py/requirements/base.in Added new dependency: google-cloud-storage==2.19.0
api/py/requirements/base.txt Added multiple new dependencies related to Google Cloud and others
api/py/requirements/dev.in Removed dependency: google-cloud-storage==2.19.0
api/py/requirements/dev.txt Removed several dependencies related to Google Cloud services

Possibly Related PRs

Suggested Reviewers

  • nikhil-zlai
  • piyush-zlai
  • tchow-zlai

Poem

🚀 Zipline soars on CLI wings so bright,
Click commands dance with newfound might,
From argparse to a sleeker design,
Code flows smooth, a digital shrine!
Refactoring magic takes its flight 🌈

Warning

Review ran into problems

🔥 Problems

GitHub Actions: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.

Finishing Touches

  • 📝 Generate Docstrings (Beta)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
Dockerfile (3)

79-79: Use COPY instead of ADD to follow best practices.

-ADD api/py /srv/chronon
+COPY api/py /srv/chronon
🧰 Tools
🪛 Hadolint (2.12.0)

[error] 79-79: Use COPY instead of ADD for files and folders

(DL3020)


80-81: Installing requirements early can benefit layer caching.


82-83: COPY for individual scripts is fine; consider consolidating copies.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4755a5d and f390818.

📒 Files selected for processing (6)
  • Dockerfile (1 hunks)
  • api/py/ai/chronon/repo/explore.py (2 hunks)
  • api/py/ai/chronon/repo/run.py (2 hunks)
  • api/py/setup.py (2 hunks)
  • docker-compose.yml (1 hunks)
  • quickstart/requirements.txt (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • quickstart/requirements.txt
🧰 Additional context used
🪛 Hadolint (2.12.0)
Dockerfile

[error] 79-79: Use COPY instead of ADD for files and folders

(DL3020)

🔇 Additional comments (9)
api/py/setup.py (3)

29-29: Version update is valid and aligns with semantic versioning.


55-61: Console scripts are an improvement over scripts= for modular CLI.


65-65: Renaming package to 'zipline-ai' is consistent with PR objectives.

api/py/ai/chronon/repo/run.py (2)

626-626: New main() function improves clarity and testing feasibility.


725-726: Entry point call is correct for script execution.

docker-compose.yml (1)

37-38: Switch to 'build' instead of pre-built image is good for local iteration.

Dockerfile (1)

85-85: Environment variables set the runtime path cleanly.

api/py/ai/chronon/repo/explore.py (2)

Line range hint 386-416: Encapsulating logic in a main() is nice.
This structuring makes the script more testable and clearer.


417-418: Good use of main() for script execution.
This avoids cluttering the global namespace. Verify that CLI args are parsed as intended.

✅ Verification successful

Let me check the main function implementation and CLI argument parsing.


CLI args are properly parsed and handled.
The main function correctly parses required and optional arguments using argparse:

  • Required: keyword for lookup
  • Optional: --conf-root with default value
  • Optional: --handler-args for special handlers
🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found


Script:

#!/bin/bash
# Find the main function implementation
ast-grep --pattern 'def main():
  $$$'

Length of output: 13201


Script:

#!/bin/bash
# Look for argument parsing
rg "ArgumentParser|argparse|sys.argv" -A 3

Length of output: 2673


Script:

#!/bin/bash
# Get the full content of explore.py to understand the context
cat api/py/ai/chronon/repo/explore.py

Length of output: 14140

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
api/py/ai/chronon/repo/explore.py (2)

Line range hint 387-417: LGTM! Consider adding error handling.

Clean refactoring to improve modularity. Consider wrapping the main execution in a try-except block to handle potential exceptions gracefully.

 def main():
+    try:
         parser = argparse.ArgumentParser(description="Explore tool for chronon")
         parser.add_argument("keyword", help="Keyword to look up keys")
         parser.add_argument("--conf-root", help="Conf root for the configs", default=CWD)
         parser.add_argument(
             "--handler-args", nargs="*", help="Special arguments for handler keywords of the form param=value")
         args = parser.parse_args()
         root = args.conf_root
         if not (root.endswith("chronon") or root.endswith("zipline")):
             print("This script needs to be run from chronon conf root - with folder named 'chronon' or 'zipline', found: "
                   + root)
         teams = load_team_data(os.path.join(root, 'teams.json'))
         gb_index = build_index("group_bys", GB_INDEX_SPEC, root=root, teams=teams)
         join_index = build_index("joins", JOIN_INDEX_SPEC, root=root, teams=teams)
         enrich_with_joins(gb_index, join_index, root=root, teams=teams)

         candidate = args.keyword
         if candidate in handlers:
             print(f"{candidate} is a registered handler")
             handler = handlers[candidate]
             handler_args = {}
             for arg in args.handler_args:
                 splits = arg.split("=", 1)
                 assert len(splits) == 2, f"need args to handler for the form, param=value. Found and invalid arg:{arg}"
                 key, value = splits
                 handler_args[key] = value
             handler(**handler_args)
         else:
             group_bys = find_in_index(gb_index, args.keyword)
             display_entries(group_bys, args.keyword, root=root, trim_paths=True)
+    except Exception as e:
+        print(f"Error: {e}")
+        return 1
+    return 0

419-420: LGTM! Consider using the return code.

Clean entry point. Consider using the return code from main().

 if __name__ == "__main__":
-    main()
+    exit(main())
api/py/ai/chronon/repo/run.py (1)

295-298: Adopt ternary for brevity.

Use the suggested ternary operator for simpler logic:

-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"
🧰 Tools
🪛 Ruff (0.8.2)

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between ce2f019 and 01ac2aa.

📒 Files selected for processing (5)
  • api/py/ai/chronon/repo/compile.py (1 hunks)
  • api/py/ai/chronon/repo/explore.py (2 hunks)
  • api/py/ai/chronon/repo/run.py (10 hunks)
  • api/py/ai/chronon/repo/zipline.py (1 hunks)
  • api/py/setup.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

🔇 Additional comments (9)
api/py/ai/chronon/repo/zipline.py (3)

1-2: Looks good.

Imports are straightforward and aligned with the new CLI usage.


7-9: Neat CLI group definition.

A clear group name "zipline" neatly organizes subcommands.


12-13: Commands bound successfully.

Subcommands integrate well via add_command.

api/py/ai/chronon/repo/compile.py (1)

51-51: Command name changed to 'compile'.

This improves the CLI usage consistency and readability.

api/py/setup.py (3)

55-59: Entry point is well-defined.

Centralized CLI entry point with zipline is a good move for discoverability.


60-60: Updated package description.

Switched to "Zipline python API library" aligns with rebranding.


63-63: Renamed package to "zipline-ai".

This completes the naming alignment with the new brand.

api/py/ai/chronon/repo/run.py (2)

627-669: CLI restructured with Click.

Main function and options are well-organized and easy to extend.


679-685: Jar download logic.

The fallback to download_jar is clear, ensuring correctness for embedded vs. uber jars.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
api/py/ai/chronon/repo/run.py (4)

295-298: Simplify context assignment using ternary operator

-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"
🧰 Tools
🪛 Ruff (0.8.2)

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)


383-391: Extract online jar fetching logic

Move the online jar fetching logic into a separate method for better readability and maintainability.

 def __init__(self, args, jar_path):
     self.online_jar = args["online_jar"]
-    valid_jar = args["online_jar"] and os.path.exists(args["online_jar"])
-    if (self.mode in ONLINE_MODES) and (not args["sub_help"]) and not valid_jar:
-        print("Downloading online_jar")
-        self.online_jar = check_output("{}".format(args["online_jar_fetch"])).decode(
-            "utf-8"
-        )
-        os.environ["CHRONON_ONLINE_JAR"] = self.online_jar
-        print("Downloaded jar to {}".format(self.online_jar))
+    self._fetch_online_jar_if_needed(args)

+def _fetch_online_jar_if_needed(self, args):
+    valid_jar = args["online_jar"] and os.path.exists(args["online_jar"])
+    if (self.mode in ONLINE_MODES) and (not args["sub_help"]) and not valid_jar:
+        print("Downloading online_jar")
+        self.online_jar = check_output("{}".format(args["online_jar_fetch"])).decode("utf-8")
+        os.environ["CHRONON_ONLINE_JAR"] = self.online_jar
+        print("Downloaded jar to {}".format(self.online_jar))

601-624: Consider centralizing default values

Default values are currently set in both set_defaults and click options. Consider moving all defaults to a central configuration.

Also applies to: 626-665


283-290: Enhance error message for invalid conf path

Add more context to the error message by including the expected path format.

-    logging.error(
-        "Invalid conf path: {}, please ensure to supply the relative path to zipline/ folder".format(
-            params["conf"]
-        )
-    )
+    logging.error(
+        "Invalid conf path: {}. Expected format: <context>/<conf_type>/<team>/<name> relative to zipline/ folder".format(
+            params["conf"]
+        )
+    )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 01ac2aa and f35985d.

📒 Files selected for processing (1)
  • api/py/ai/chronon/repo/run.py (8 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
api/py/ai/chronon/repo/run.py (6)

196-200: Add version format validation

Add input validation for the version parameter to ensure it matches the expected format (e.g., X.Y.Z).

 def download_jar(
         version,
         jar_type="uber",
         release_tag=None,
         spark_version="2.4.0",
         skip_download=False,
+):
+    if version and version != "latest":
+        assert re.match(r'^\d+\.\d+\.\d+$', version), f"Invalid version format: {version}. Expected format: X.Y.Z"
 ):

295-298: Simplify context assignment

Use a ternary operator for better readability.

-                if params["env"]:
-                    context = params["env"]
-                else:
-                    context = "dev"
+                context = params["env"] if params["env"] else "dev"
🧰 Tools
🪛 Ruff (0.8.2)

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)


384-390: Improve online jar validation error handling

Add explicit error message when online jar validation fails.

         valid_jar = args["online_jar"] and os.path.exists(args["online_jar"])
         # fetch online jar if necessary
-        if (self.mode in ONLINE_MODES) and (not args["sub_help"]) and not valid_jar:
+        if self.mode in ONLINE_MODES and not args["sub_help"]:
+            if not valid_jar and not args["online_jar_fetch"]:
+                raise ValueError(f"Online jar not found at {args['online_jar']} and no fetch command provided")
             print("Downloading online_jar")

579-580: Add date format validation

Add explicit date format validation before processing.

+    def validate_date_format(date_str):
+        try:
+            return datetime.strptime(date_str, "%Y-%m-%d")
+        except ValueError:
+            raise ValueError(f"Invalid date format: {date_str}. Expected format: YYYY-MM-DD")
+
+    start_date = validate_date_format(start_date)
+    end_date = validate_date_format(end_date)
-    start_date = datetime.strptime(start_date, "%Y-%m-%d")
-    end_date = datetime.strptime(end_date, "%Y-%m-%d")

627-668: Improve help messages clarity

The help messages for some options could be more descriptive.

-@click.option("--ds", help="the end partition to backfill the data", default=datetime.today().strftime("%Y-%m-%d"))
+@click.option("--ds", help="End partition date for data backfill (format: YYYY-MM-DD)", 
+              default=datetime.today().strftime("%Y-%m-%d"))

679-688: Move jar path expansion earlier

Move the path expansion before the jar download to avoid potential issues.

     jar_path = (
-        chronon_jar
+        os.path.expanduser(chronon_jar)
         if chronon_jar
         else download_jar(
             version,
             jar_type=jar_type,
             release_tag=release_tag,
             spark_version=os.environ.get("SPARK_VERSION", spark_version),
         )
     )
-    Runner(ctx.params, os.path.expanduser(jar_path)).run()
+    Runner(ctx.params, jar_path).run()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between f35985d and fef3186.

📒 Files selected for processing (1)
  • api/py/ai/chronon/repo/run.py (10 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (2)
api/py/ai/chronon/repo/run.py (2)

Line range hint 244-273: Improve error handling in environment setup.

The environment setup lacks proper error handling for missing or invalid environment variables.

Add try-catch blocks and validation:

 def set_runtime_env(params):
+    if not isinstance(params, dict):
+        raise ValueError("params must be a dictionary")
     effective_mode = params["mode"]
+    if effective_mode is None:
+        raise ValueError("mode parameter is required")
🧰 Tools
🪛 Ruff (0.8.2)

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)


Line range hint 488-497: Add retry mechanism for streaming app verification.

The streaming app verification lacks retry logic, which could fail due to temporary network issues.

+@retry_decorator(retries=3, backoff=20)
+def verify_running_apps(app_name, list_apps_cmd):
+    running_apps = check_output("{}".format(list_apps_cmd)).decode("utf-8").split("\n")
+    return [json.loads(app.strip()) for app in running_apps if app.strip()]

 if len(filtered_apps) > 0:
     if self.mode == "streaming":
         assert len(filtered_apps) == 1, "More than one found, please kill them all"
🧹 Nitpick comments (3)
api/py/ai/chronon/repo/run.py (3)

295-298: Simplify context assignment using ternary operator.

-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"
🧰 Tools
🪛 Ruff (0.8.2)

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)


627-688: Enhance CLI documentation and type validation.

The CLI options lack type validation and comprehensive documentation.

 @click.option("--parallelism",
+              type=click.IntRange(min=1),
               help="break down the backfill range into this number of tasks in parallel. "
                    "Please use it along with --start-ds and --end-ds and only in manual mode")
 @click.option("--repo",
+              type=click.Path(exists=True, file_okay=False, dir_okay=True),
               help="Path to chronon repo",
               default=os.environ.get("CHRONON_REPO_PATH", "."))

Line range hint 579-597: Optimize date range splitting for large ranges.

The date range splitting function could be optimized for memory efficiency with large date ranges.

 def split_date_range(start_date, end_date, parallelism):
+    if not isinstance(parallelism, int) or parallelism < 1:
+        raise ValueError("Parallelism must be a positive integer")
     start_date = datetime.strptime(start_date, "%Y-%m-%d")
     end_date = datetime.strptime(end_date, "%Y-%m-%d")
-    date_ranges = []
+    for i in range(parallelism):
+        split_start = start_date + timedelta(days=i * split_size)
+        split_end = end_date if i == parallelism - 1 else split_start + timedelta(days=split_size - 1)
+        yield (split_start.strftime("%Y-%m-%d"), split_end.strftime("%Y-%m-%d"))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between fef3186 and ba5bcc9.

📒 Files selected for processing (1)
  • api/py/ai/chronon/repo/run.py (11 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

Comment on lines +379 to +384
self.repo = args["repo"]
self.conf = args["conf"]
self.sub_help = args["sub_help"]
self.mode = args["mode"]
self.online_jar = args["online_jar"]
valid_jar = args["online_jar"] and os.path.exists(args["online_jar"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Validate input parameters in Runner initialization.

The Runner class accepts parameters without validation.

Add parameter validation:

 def __init__(self, args, jar_path):
+    if not isinstance(args, dict):
+        raise ValueError("args must be a dictionary")
+    if not jar_path or not isinstance(jar_path, str):
+        raise ValueError("jar_path must be a non-empty string")
     self.repo = args["repo"]

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
api/py/ai/chronon/repo/run.py (2)

295-298: Simplify context assignment using ternary operator.

-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"
🧰 Tools
🪛 Ruff (0.8.2)

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)


673-675: Add docstring to main function.

Add documentation describing the function's purpose and parameters.

 def main(ctx, conf, env, mode, ds, app_name, start_ds, end_ds, parallelism, repo, online_jar, online_class, version,
          spark_version, spark_submit_path, spark_streaming_submit_path, online_jar_fetch, sub_help, conf_type,
          online_args, chronon_jar, release_tag, list_apps, render_info):
+    """Execute the Chronon pipeline with the specified configuration.
+    
+    Args:
+        ctx: Click context object
+        conf: Configuration path
+        env: Environment (dev/prod)
+        [...]
+    """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between ba5bcc9 and 80384d9.

📒 Files selected for processing (2)
  • api/py/ai/chronon/repo/compile.py (1 hunks)
  • api/py/ai/chronon/repo/run.py (11 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/py/ai/chronon/repo/compile.py
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

🔇 Additional comments (2)
api/py/ai/chronon/repo/run.py (2)

583-584: LGTM: Correct handling of inclusive date ranges.


383-388: 🛠️ Refactor suggestion

Add parameter validation in Runner initialization.

Add checks for required parameters to prevent runtime errors.

 def __init__(self, args, jar_path):
+    if not isinstance(args, dict):
+        raise ValueError("args must be a dictionary")
+    if not jar_path:
+        raise ValueError("jar_path is required")
     self.repo = args["repo"]
     self.conf = args["conf"]

Likely invalid or redundant comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
api/py/ai/chronon/repo/run.py (2)

295-296: Simplify context assignment using ternary operator.

-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"
🧰 Tools
🪛 Ruff (0.8.2)

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)


605-632: Consider environment variable prefix constants.

Define a constant for the environment variable prefix "CHRONON_" to avoid string duplication.

+ENV_PREFIX = "CHRONON_"
 def set_defaults(ctx):
     """Set default values based on environment"""
-    chronon_repo_path = os.environ.get("CHRONON_REPO_PATH", ".")
+    chronon_repo_path = os.environ.get(f"{ENV_PREFIX}REPO_PATH", ".")
api/py/test/test_run.py (4)

31-47: Consider using a pytest fixture.
This helper could be converted into a fixture for more flexible reuse.


83-128: Reduce environment reset duplication.
Repeated code for resetting environment variables can be placed in a shared setup or fixture.

Also applies to: 144-149, 159-164, 179-183, 188-192


169-175: Use a more specific exception.
Catching Exception is too broad. Consider a narrower exception class for clarity.

🧰 Tools
🪛 Ruff (0.8.2)

169-169: pytest.raises(Exception) should be considered evil

(B017)


Line range hint 271-328: Consider unifying repeated steps.
Streaming and streaming-client tests share logic. A helper could reduce duplication.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 80384d9 and 7679303.

📒 Files selected for processing (2)
  • api/py/ai/chronon/repo/run.py (11 hunks)
  • api/py/test/test_run.py (8 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

api/py/test/test_run.py

169-169: pytest.raises(Exception) should be considered evil

(B017)

🔇 Additional comments (9)
api/py/ai/chronon/repo/run.py (5)

196-200: LGTM: Download jar function signature and validation.

The function correctly validates spark versions against supported versions.

Also applies to: 203-203


383-388: Add input validation in Runner initialization.

The past review comment about validating input parameters in Runner initialization is still valid.

Also applies to: 390-392


410-412: LGTM: Mode validation logic.

The code properly validates modes against possible modes for the configuration type.

Also applies to: 415-416


583-584: LGTM: Date range calculation.

The date range calculation correctly includes the end date.


634-701: LGTM: Click command setup.

The command-line interface is well-structured with appropriate options and defaults.

api/py/test/test_run.py (4)

19-19: Good import for Click usage.


197-210: Looks good.
No issues spotted with property defaults logic.


213-241: Clean parameter management.
Render info testing appears consistent with the new approach.


Line range hint 244-264: No concerns.
Tests properly validate rendering script usage.

Comment on lines +270 to +274
effective_mode = params["mode"]
if effective_mode and "streaming" in effective_mode:
effective_mode = "streaming"
if args.repo:
teams_file = os.path.join(args.repo, "teams.json")
if params["repo"]:
teams_file = os.path.join(params["repo"], "teams.json")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for missing repo path.

The code assumes repo path exists without validation.

 if params["repo"]:
+    if not os.path.exists(params["repo"]):
+        raise ValueError(f"Repository path does not exist: {params['repo']}")
     teams_file = os.path.join(params["repo"], "teams.json")

Also applies to: 281-287

Comment on lines +426 to +428
self.args = args["args"] if args["args"] else ""
self.online_class = args["online_class"]
self.app_name = args["app_name"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Validate paths before assignment.

File paths should be validated before assignment.

-self.spark_submit = args["spark_submit_path"]
+if not os.path.exists(args["spark_submit_path"]):
+    raise ValueError(f"Spark submit path does not exist: {args['spark_submit_path']}")
+self.spark_submit = args["spark_submit_path"]

Also applies to: 430-430, 433-435, 437-438

@chewy-zlai chewy-zlai changed the title Rename compile.py and Build zipline-ai Build zipline-ai with Commands "zipline compile" and "zipline run" Jan 5, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
api/py/ai/chronon/repo/run.py (3)

196-200: Add input validation for spark_version.

The assertion message is good, but add upfront validation.

 def download_jar(
         version,
         jar_type="uber",
         release_tag=None,
         spark_version="2.4.0",
         skip_download=False,
 ):
+    if not isinstance(spark_version, str):
+        raise ValueError(f"spark_version must be a string, got {type(spark_version)}")
     assert (
         spark_version in SUPPORTED_SPARK
     ), f"Received unsupported spark version {spark_version}. Supported spark versions are {SUPPORTED_SPARK}"

Also applies to: 203-203


295-296: Use ternary operator for context assignment.

-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"
🧰 Tools
🪛 Ruff (0.8.2)

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)


583-584: Use datetime.timedelta directly for date difference.

Simplify the date range calculation.

-    total_days = (
-                         end_date - start_date
-                 ).days + 1
+    total_days = (end_date - start_date).days + 1
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 7679303 and 1d52e11.

📒 Files selected for processing (1)
  • api/py/ai/chronon/repo/run.py (11 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py

295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

🔇 Additional comments (5)
api/py/ai/chronon/repo/run.py (5)

270-274: Add path validation for repo and conf.

The code assumes paths exist without proper validation.

Also applies to: 281-287


383-388: Add input validation for Runner initialization.

Also applies to: 390-392


415-416: Validate paths before assignment.

Also applies to: 418-418, 421-422, 426-428, 430-430, 433-435, 437-438


605-632: LGTM! Good default configuration setup.

The function properly handles environment variables and default paths.


634-696: Well-structured CLI using Click.

Good job on:

  • Comprehensive options with clear help messages
  • Environment variable fallbacks
  • Flexible argument handling with allow_extra_args

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
api/py/ai/chronon/repo/run.py (1)

Line range hint 307-311: Add path traversal protection.

Validate conf_path to prevent directory traversal attacks.

 conf_path = os.path.join(params["repo"], params["conf"])
+conf_path = os.path.abspath(conf_path)
+if not conf_path.startswith(os.path.abspath(params["repo"])):
+    raise ValueError("Invalid conf path: Attempted path traversal")
 if os.path.isfile(conf_path):
🧰 Tools
🪛 Ruff (0.8.2)

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

🧹 Nitpick comments (2)
api/py/ai/chronon/repo/run.py (2)

300-303: Simplify context assignment.

Use a ternary operator for better readability.

-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"
🧰 Tools
🪛 Ruff (0.8.2)

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)


793-827: Add docstring and type hints to main function.

The main function would benefit from:

  • A comprehensive docstring explaining the command's purpose
  • Type hints for all parameters
@click.command(name="run", context_settings=dict(allow_extra_args=True, ignore_unknown_options=True))
+def main(
+    ctx: click.Context,
+    conf: Optional[str],
+    env: str,
+    # ... other parameters with type hints
+) -> None:
+    """Execute Chronon jobs with specified configuration and parameters.
+    
+    Args:
+        ctx: Click context for handling command state
+        conf: Configuration path relative to repo
+        env: Environment (dev/prod)
+        # ... document other parameters
+    """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 9e015ee and a2cb521.

📒 Files selected for processing (1)
  • api/py/ai/chronon/repo/run.py (12 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: no_spark_scala_tests
🔇 Additional comments (2)
api/py/ai/chronon/repo/run.py (2)

388-393: Add parameter validation in Runner initialization.

Critical parameters should be validated to prevent runtime errors.


64-65: Verify Spark 3.5.1 compatibility.

The addition of Spark 3.5.1 support needs validation.

✅ Verification successful

Spark 3.5.1 support properly configured

  • Build files correctly define and use Spark 3.5.1
  • Version validation and Scala mappings are consistent
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if Spark 3.5.1 is properly supported across the codebase
rg -g '*.{scala,java}' 'spark.*3\.5\.1'

Length of output: 41


Script:

#!/bin/bash
# Check build files and Python code for Spark version references
echo "=== Build Files ==="
fd -e xml -e sbt . | xargs rg "3\.5\.1|spark"

echo -e "\n=== Python Files ==="
rg -g "*.py" "spark.*version|SUPPORTED_SPARK"

echo -e "\n=== Test Files ==="
fd -e py "test_" | xargs rg "spark.*version|3\.5\.1"

Length of output: 5601


Script:

#!/bin/bash
# Check for version-specific implementations
rg -g "*.{scala,py}" "(?:if|match).*(?:spark|version).*3\." -A 3

Length of output: 66

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
api/py/ai/chronon/repo/run.py (1)

388-393: ⚠️ Potential issue

Validate input parameters in Runner initialization.

Add parameter validation for args and jar_path.

🧹 Nitpick comments (1)
api/py/ai/chronon/repo/run.py (1)

300-303: Use ternary operator for conciseness.

-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"
🧰 Tools
🪛 Ruff (0.8.2)

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between a2cb521 and cc331be.

📒 Files selected for processing (1)
  • api/py/ai/chronon/repo/run.py (12 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: no_spark_scala_tests
🔇 Additional comments (3)
api/py/ai/chronon/repo/run.py (3)

64-65: LGTM: Spark 3.5.1 support added correctly.


794-846: LGTM: Well-structured click command setup.

Command options are comprehensive and well-documented.


275-279: ⚠️ Potential issue

Add path validation.

Validate repository path before accessing files.

 if params["repo"]:
+    if not os.path.exists(params["repo"]):
+        raise ValueError(f"Repository path does not exist: {params['repo']}")
     teams_file = os.path.join(params["repo"], "teams.json")

Likely invalid or redundant comment.

Comment on lines +395 to 399
if (self.mode in ONLINE_MODES) and (not args["sub_help"]) and not valid_jar:
print("Downloading online_jar")
self.online_jar = check_output("{}".format(args.online_jar_fetch)).decode(
self.online_jar = check_output("{}".format(args["online_jar_fetch"])).decode(
"utf-8"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for jar download.

Add try-except block to handle download failures gracefully.

 if (self.mode in ONLINE_MODES) and (not args["sub_help"]) and not valid_jar:
     print("Downloading online_jar")
+    try:
         self.online_jar = check_output("{}".format(args["online_jar_fetch"])).decode(
             "utf-8"
         )
+    except subprocess.CalledProcessError as e:
+        raise RuntimeError(f"Failed to download online jar: {e}")
     os.environ["CHRONON_ONLINE_JAR"] = self.online_jar

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
api/py/ai/chronon/repo/run.py (2)

288-294: Improve error message specificity.

The error message could be more helpful by suggesting the correct path format.

-                    logging.error(
-                        "Invalid conf path: {}, please ensure to supply the relative path to zipline/ folder".format(
-                            params["conf"]
-                        )
-                    )
+                    logging.error(
+                        "Invalid conf path: {}. Expected format: <context>/<conf_type>/<team>/<name>".format(
+                            params["conf"]
+                        )
+                    )

300-303: Simplify context assignment.

Use a ternary operator for cleaner code.

-                if params["env"]:
-                    context = params["env"]
-                else:
-                    context = "dev"
+                context = params["env"] if params["env"] else "dev"
🧰 Tools
🪛 Ruff (0.8.2)

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between cc331be and 8e23f46.

📒 Files selected for processing (1)
  • api/py/ai/chronon/repo/run.py (12 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: no_spark_scala_tests
🔇 Additional comments (5)
api/py/ai/chronon/repo/run.py (5)

21-21: LGTM! Good choice of CLI framework.

The transition to Click is a good choice as it provides better CLI structuring capabilities.


388-393: Add input validation.

Previous review comment about parameter validation remains applicable.


794-828: LGTM! Well-structured CLI implementation.

The click command is well-organized with descriptive help texts and appropriate option types.


431-443: ⚠️ Potential issue

Validate paths before assignment.

Add existence checks for spark_submit paths.

+        def _validate_path(path, name):
+            if not os.path.exists(path):
+                raise ValueError(f"Invalid {name}: {path}")
+
         if self.mode == "streaming":
+            _validate_path(args["spark_streaming_submit_path"], "spark streaming submit path")
             self.spark_submit = args["spark_streaming_submit_path"]
         elif self.mode == "info":
             assert os.path.exists(
                 args["render_info"]
             ), "Invalid path for the render info script: {}".format(args["render_info"])
             self.render_info = args["render_info"]
         else:
+            _validate_path(args["spark_submit_path"], "spark submit path")
             self.spark_submit = args["spark_submit_path"]

Likely invalid or redundant comment.


64-65: Verify Spark 3.5.1 compatibility.

While adding Spark 3.5.1 support is good, ensure all features work with this version.

Run this to check for potential compatibility issues:

✅ Verification successful

Spark 3.5.1 compatibility verified successfully.

The codebase properly handles Spark 3.5.x specific changes and uses stable APIs. Version-specific features like error messages are correctly implemented in tests.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for Spark version-specific code or configurations
rg -A 3 "spark.*3\.[0-9]" 

Length of output: 5105


Script:

#!/bin/bash
# Check for SQL queries and potential deprecated features
rg -g '*.{scala,java,py}' '(?i)spark\.sql|createDataFrame|withColumn|udf|rdd\.' -A 2

# Look for test configurations
fd -e scala -e java test$ -x cat {} 

Length of output: 76729


Script:

#!/bin/bash
# Check for test configurations and Spark version settings
rg -g '*.{scala,java,xml,conf}' 'SparkSession\.builder|spark\.version|spark-version'

Length of output: 361

@chewy-zlai chewy-zlai merged commit 588627c into main Jan 10, 2025
4 of 5 checks passed
@chewy-zlai chewy-zlai deleted the chewys/setup_python_build branch January 10, 2025 01:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
api/py/requirements/base.txt (1)

1-7: Consider pinning pip-compile-multi version

Add pip-compile-multi version to dev requirements for reproducible builds.

api/py/ai/chronon/repo/run.py (2)

300-303: Simplify context assignment using ternary operator.

-if params["env"]:
-    context = params["env"]
-else:
-    context = "dev"
+context = params["env"] if params["env"] else "dev"
🧰 Tools
🪛 Ruff (0.8.2)

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)


826-828: Add type hints to main function parameters.

Improve code maintainability by adding type hints.

-def main(ctx, conf, env, mode, dataproc, ds, app_name, start_ds, end_ds, parallelism, repo, online_jar, online_class,
-         version, spark_version, spark_submit_path, spark_streaming_submit_path, online_jar_fetch, sub_help, conf_type,
-         online_args, chronon_jar, release_tag, list_apps, render_info):
+def main(ctx: click.Context, conf: str, env: str, mode: str, dataproc: bool, ds: str, app_name: str, start_ds: str,
+         end_ds: str, parallelism: int, repo: str, online_jar: str, online_class: str, version: str,
+         spark_version: str, spark_submit_path: str, spark_streaming_submit_path: str, online_jar_fetch: str,
+         sub_help: bool, conf_type: str, online_args: str, chronon_jar: str, release_tag: str,
+         list_apps: str, render_info: str) -> None:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 8e23f46 and 20bbf1e.

📒 Files selected for processing (2)
  • api/py/ai/chronon/repo/run.py (12 hunks)
  • api/py/requirements/base.txt (1 hunks)
🧰 Additional context used
🪛 LanguageTool
api/py/requirements/base.txt

[duplication] ~24-~24: Possible typo: you repeated a word.
Context: ...rage google-cloud-core==2.4.1 # via google-cloud-storage google-cloud-storage==2.19.0 # via -r requirements/base....

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~30-~30: Possible typo: you repeated a word.
Context: ...ia # google-cloud-storage # google-resumable-media google-resumable-media==2.7.2 # via google-cloud-storage g...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~57-~57: Possible typo: you repeated a word.
Context: ...# via google-auth six==1.17.0 # via thrift thrift==0.21.0 # via -r requirements/base....

(ENGLISH_WORD_REPEAT_RULE)

🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py

300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block

Replace if-else-block with context = params["env"] if params["env"] else "dev"

(SIM108)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: no_spark_scala_tests
🔇 Additional comments (5)
api/py/requirements/base.txt (2)

50-54: Check requests package security advisories

Version 2.32.3 is recent but verify security status.

✅ Verification successful

Requests package 2.32.3 is secure

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for CVEs
gh api graphql -f query='
{
  securityVulnerabilities(first: 5, ecosystem: PIP, package: "requests", severities: [CRITICAL, HIGH]) {
    nodes {
      advisory {
        summary
        severity
      }
      vulnerableVersionRange
    }
  }
}'

Length of output: 426


25-26: Verify google-cloud-storage version compatibility

Version 2.19.0 is relatively recent but check compatibility with your Python version.

✅ Verification successful

Python version compatibility verified

The google-cloud-storage 2.19.0 requirement (Python >=3.7) matches the project's Python version specification.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check minimum Python version requirement
curl -s https://pypi.org/pypi/google-cloud-storage/2.19.0/json | jq '.info.requires_python'

Length of output: 99


Script:

#!/bin/bash
# Find Python version specs
fd -e toml -e py -e cfg -e txt -e yaml -e yml -x grep -l "python" {} \; | xargs -I {} sh -c 'echo "=== {}:"; cat {}'

Length of output: 68129

api/py/ai/chronon/repo/run.py (3)

64-65: LGTM: Spark 3.5.1 support added correctly.

The mapping to Scala 2.12 for Spark 3.5.1 is accurate.


388-393: Validate input parameters in Runner initialization.

The Runner class still accepts parameters without validation.


275-278: ⚠️ Potential issue

Add path validation for repo directory.

Add existence check for the repo directory before accessing files within it.

 effective_mode = params["mode"]
 if effective_mode and "streaming" in effective_mode:
     effective_mode = "streaming"
+if params["repo"] and not os.path.exists(params["repo"]):
+    raise ValueError(f"Repository path does not exist: {params['repo']}")
 if params["repo"]:

Likely invalid or redundant comment.

@coderabbitai coderabbitai bot mentioned this pull request Jan 12, 2025
4 tasks
@coderabbitai coderabbitai bot mentioned this pull request Jan 21, 2025
4 tasks
@coderabbitai coderabbitai bot mentioned this pull request Apr 1, 2025
4 tasks
This was referenced Apr 17, 2025
kumar-zlai pushed a commit that referenced this pull request Apr 25, 2025
)

## Summary
Creates console command "zipline" with subcommands "compile" and "run".
This change refactors run.py from argparse to click so both can be
commands for a click group. This change also updates the pypi package
name to zipline-ai.

## Checklist
- [ ] Added Unit Tests
- [ x] Covered by existing CI
- [ x] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

- **New Features**
- Introduced a new command-line interface (CLI) using the `click`
library.
- Added `zipline` command group with `extract_and_convert` and
`run_main` commands.

- **Improvements**
  - Enhanced command-line argument parsing and handling.
  - Updated package metadata and versioning.
  - Expanded supported Spark versions.

- **Package Changes**
  - Renamed package from "chronon-ai" to "zipline-ai".
  - Updated package description and version.

- **Breaking Changes**
  - Modified command-line argument structure.
  - Replaced `argparse` with `click` library for argument management.

- **Dependency Updates**
  - Added multiple new dependencies, including `google-cloud-storage`.
- Removed several dependencies related to Google Cloud services from
development requirements.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---
- To see the specific tasks where the Asana app for GitHub is being
used, see below:
  - https://app.asana.com/0/0/1209001371150237

---------

Co-authored-by: tchow <[email protected]>
kumar-zlai pushed a commit that referenced this pull request Apr 29, 2025
)

## Summary
Creates console command "zipline" with subcommands "compile" and "run".
This change refactors run.py from argparse to click so both can be
commands for a click group. This change also updates the pypi package
name to zipline-ai.

## Checklist
- [ ] Added Unit Tests
- [ x] Covered by existing CI
- [ x] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

- **New Features**
- Introduced a new command-line interface (CLI) using the `click`
library.
- Added `zipline` command group with `extract_and_convert` and
`run_main` commands.

- **Improvements**
  - Enhanced command-line argument parsing and handling.
  - Updated package metadata and versioning.
  - Expanded supported Spark versions.

- **Package Changes**
  - Renamed package from "chronon-ai" to "zipline-ai".
  - Updated package description and version.

- **Breaking Changes**
  - Modified command-line argument structure.
  - Replaced `argparse` with `click` library for argument management.

- **Dependency Updates**
  - Added multiple new dependencies, including `google-cloud-storage`.
- Removed several dependencies related to Google Cloud services from
development requirements.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---
- To see the specific tasks where the Asana app for GitHub is being
used, see below:
  - https://app.asana.com/0/0/1209001371150237

---------

Co-authored-by: tchow <[email protected]>
chewy-zlai added a commit that referenced this pull request May 15, 2025
)

## Summary
Creates console command "zipline" with subcommands "compile" and "run".
This change refactors run.py from argparse to click so both can be
commands for a click group. This change also updates the pypi package
name to zipline-ai.

## Checklist
- [ ] Added Unit Tests
- [ x] Covered by existing CI
- [ x] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

- **New Features**
- Introduced a new command-line interface (CLI) using the `click`
library.
- Added `zipline` command group with `extract_and_convert` and
`run_main` commands.

- **Improvements**
  - Enhanced command-line argument parsing and handling.
  - Updated package metadata and versioning.
  - Expanded supported Spark versions.

- **Package Changes**
  - Renamed package from "chronon-ai" to "zipline-ai".
  - Updated package description and version.

- **Breaking Changes**
  - Modified command-line argument structure.
  - Replaced `argparse` with `click` library for argument management.

- **Dependency Updates**
  - Added multiple new dependencies, including `google-cloud-storage`.
- Removed several dependencies related to Google Cloud services from
development requirements.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---
- To see the specific tasks where the Asana app for GitHub is being
used, see below:
  - https://app.asana.com/0/0/1209001371150237

---------

Co-authored-by: tchow <[email protected]>
chewy-zlai added a commit that referenced this pull request May 16, 2025
)

## Summary
Creates console command "zipline" with subcommands "compile" and "run".
This change refactors run.py from argparse to cliour clients so both can be
commands for a cliour clients group. This change also updates the pypi paour clientsage
name to zipline-ai.

## Cheour clientslist
- [ ] Added Unit Tests
- [ x] Covered by existing CI
- [ x] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

- **New Features**
- Introduced a new command-line interface (CLI) using the `cliour clients`
library.
- Added `zipline` command group with `extract_and_convert` and
`run_main` commands.

- **Improvements**
  - Enhanced command-line argument parsing and handling.
  - Updated paour clientsage metadata and versioning.
  - Expanded supported Spark versions.

- **Paour clientsage Changes**
  - Renamed paour clientsage from "chronon-ai" to "zipline-ai".
  - Updated paour clientsage description and version.

- **Breaking Changes**
  - Modified command-line argument structure.
  - Replaced `argparse` with `cliour clients` library for argument management.

- **Dependency Updates**
  - Added multiple new dependencies, including `google-cloud-storage`.
- Removed several dependencies related to Google Cloud services from
development requirements.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---
- To see the specific tasks where the Asana app for GitHub is being
used, see below:
  - https://app.asana.com/0/0/1209001371150237

---------

Co-authored-by: tchow <[email protected]>
@coderabbitai coderabbitai bot mentioned this pull request Jun 6, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants